Epitope-based peptide vaccine design and elucidation of novel compounds against 3C like protein of SARS-CoV-2

Coronaviruses (CoVs) are positive-stranded RNA viruses with short clubs on their edges. CoVs are pathogenic viruses that infect several animals and plant organisms, as well as humans (lethal respiratory dysfunctions). A noval strain of CoV has been reported and named as SARS-CoV-2. Numerous COVID-19 cases were being reported all over the World. COVID-19 and has a high mortality rate. In the present study, immunoinformatics techniques were utilized to predict the antigenic epitopes against 3C like protein. B-cell epitopes and Cytotoxic T-lymphocyte (CTL) were designed computationally against SARS-CoV-2. Multiple Sequence Alignment (MSA) of seven complete strains (HCoV-229E, HCoV-NL63, HCoV-OC43, HCoV-HKU1, SARS-CoV, MERS-CoV, and SARS-CoV-2) was performed to elucidate the binding domain and interacting residues. MHC-I binding epitopes were evaluated by analyzing the binding affinity of the top-ranked peptides having HLA molecule. By utilizing the docked complexes of CTL epitopes with antigenic sites, the binding relationship and affinity of top-ranked predicted peptides with the MHC-I HLA protein were investigated. The molecular docking analyses were conducted on the ZINC database library and twelve compounds having least binding energy were scrutinized. In conclusion, twelve CTL epitopes (GTDLEGNFY, TVNVLAWLY, GSVGFNIDY, SEDMLNPNY, LSQTGIAV, VLDMCASLK, LTQDHVDIL, TTLNDFNLV, CTSEDMLNP, TTITVNVLA, YNGSPSGVY, and SMQNCVLKL) were identified against SARS-CoV-2.


Introduction
The viral parentage is used to inhibit a range of diseases and even serves as a tool for the study for previously unknown pathogens. Numerous novel viruses possessing cytotoxic properties have been discovered, including those that do not replicate in cell culture and those that induce cytotoxicity (CPE) [1]. Coronaviridae viruses are RNA-encapsulated viruses belongs to the

Sequence retrieval
The amino acid sequence of SARS-CoV-2 protease inhibitor (PDB 6M2N) was retrieved from Protein Data Bank (PDB) [11,12] and X-ray crystallographic structure of a selected protein (306 residues) was reteived having resolution of 2.20 Å. ProtParam was used to evaluate the biochemical properties of the target protein [13].
The surface accessibility and hydrophilic nature of B-cell epitopes are significant for B-cell epitopes [22] by accessing the Immune Epitope Database and Analyses Resource (IEDB), as stated by Parker Hydrophilicity Prediction (PHP), Karplus and Schulz Flexibility Prediction [23], Kolaskar and Tongaonkar antigenicity scale, and Emini surface accessibility prediction [24]. Three different methods were used in conjunction with the Ellipro to predict IEDB analysis resource conformational B-cell epitopes, including nearby clustering of residues based on the protrusion index (PI), approximating the protein shape, and the prediction of IEDB analysis resource conformational B-cell epitopes [24], the criteria of the least possible score of 0.5 and maximum distance of 6 were followed [25].

Prediction of cytotoxic effects of epitopes on T-Lymphocytes (CTL)
To evaluate CTL epitope predictions, the NetCTL.1.2 server was utilized [4]. The features of NetCTL.1.2 was set as super type A1 including C-terminal cleavage weight 0.15, TAP transport efficiency weight 0.05, and epitope prediction thresholds. The C-terminal cleavage weight was 0.15, the TAP transport efficiency weight was 0.05, and the epitope prediction threshold was 0.75. MHC molecules are antigens, and CTLs are activated by the surface of these molecules. The server was used to combine the antigen processing (TAP) transport efficiency transporter, proteasome C-terminal cleavage, and Class I prediction into one analytics system. FASTA sequences from the selected species were utilized to analyze the human leucocyte antigen (HLA) alleles and the length of the polypeptide sequence. TAP transportation efficiency was assessed by using a weight matrix and T-epitome forecasts, and an artificial neural network was utilized to forecast proteasome C-terminal division and MHC Class-I binding [4].

Coverage of world population
The IEDB server was utilized to perform worldwide population coverage analyses. Ten different epitopes and the area-country-ethnicity combination were employed for population coverage analyses in the global population inquiry. CTL epitopes were employed against particular allele sets to cover the selected populations. Japan, China, Italy, Iran, and other countries with high COVID-19 mortality rates were selected for global population coverage analyses [26].

Molecular docking analyses of MHC protein complex peptide
SARS-CoV-2 predicted epitopes were analyzed by utilizing molecular docking analyses for target proteins that include antigens residues. The 3D structures of the proposed peptides were predicted by using PEP-FOLD 3 [27] and 100 simulation runs were conducted in various conformations changes. sOPEP energy ratings were used to assess compliance models clustered with PEP-FOLD3 [28]. The peptides having highest values throughout the screening procedure were subsequently submitted for docking analyses with MHC class I binding molecules by PatchDock [29]. All the unnecessary receptor atoms from all the docked complexes were removed for reliable reults and categorized all the remaining complexes by using geometric complementary. FireDock was employed to further refine the selected docked complexes [30]. The docked complexes were set to decrease the number of scoring errors while simultaneously increase the flexibility of the docking experiments [31]. To determine and evaluate the binding affinity and hydrogen bonding interactions of the docked complexes, PyMOL, Discovery Studio, and UCSF Chimera 1.14 were used for interactional analyses [32].

Molecular docking analyses
For simulated scanning and molecular docking analyses, the library of FDA-approved compounds was selected for virtual screening. ZINC database was used to retrieve the 2122 FDA approaved compunds and was minimized by utilizing CHemDraw and UCSF Chimera [33]. A nonstructural coronavirus protein (PDB 6M2N) was reteived that plays significant role for replication of SARS-CoV-2. PyRx [33], AutoDock, and AutoDock Vina [34] were used to perform molecular docking analyses [34]. Root-mean-square deviation (RMSD) values were used to scrutinize the suitable docked complexes. The druglike physical and chemical properties were assessed by using admetSAR and ADMETlab. UCSF Chimera and Discovery Studio were used to investigate and visualize the interacting residues [35,36].

Results and discussion
The genome of SARS CoV-2 was organized into 14 ORF (Open Read Frames) encoding 27 proteins and a projected RNA molecule of approximately 29,900 nucleotides with a positive-stranded RNA (Fig 1). The majority of the genome contains ORF1a and ORF1b coding for sixteen distinct non-structural proteins within the replica complex, however few have alternative and essential function models (nsp1-nsp16) [37]. The remaining one-third genome has four ORFs (spike S, envelope E, membrane M, nucleocapsid N) and ten extra proteins (ORF3a, ORF3b, ORF6, ORF7a, ORF7b, ORF8a, ORF8b, ORF9b, ORF9c, ORF10). Some ORFs overlap and also present in larger ORF (Fig 1). At each end of the genome, the 5 0 UTR and 3 0 UTR (non-coding or unconventional regions) were present. The UTRs are approximately of 230 bases long and play essential regulatory roles [38]. The viruses cause cold in humans including CoVs and CoVs are also responsible for high mortality rate. SARS-CoV and MERS-CoV were identified in animal sources [39]. 31 st December 2019, a novel strain of CoV was identified named as SARS-CoV-2. The conclusive reason of pandemic and consequences remain unclear due to the constantly change of environment around the disease [40].
To construct a epitope vaccine, immunoinformatics approaches were used to anticipate the relevant antigen epitopes of the target protein [40]. The goal of current effort was to predict peptide vaccines by using immunoinformatics approaches to recognize CTL epitopes [41]. Immunoinformatics analysescan identify several vaccine candidates with promising preclinical results using computational techniques [42].
CTL epitopes have been discovered to develop a peptide vaccine against HLA-B protein. The epitope-based vaccines were being used to target SARS-CoV-2 structural proteins, and CTL epitopes of the target proteins were anticipated to boost the immune response of host [39]. A non-structural protein (PDB 6M2N) was selected for epitope based vaccine design as it has significant rle in the replication of SARS-CoV-2. Vaxijen and Allergen F.P. were used to evaluate the antigenicity and allergenicity of CTL epitope.
In China, population coverage estimations of predicted epitopes showed 16.08 MHC class I coverage with an average hit of 0.48. Numerous epitopes were predicted and top ranked twelve epitopes were selected for further experiments (Tables 1 and 2). The molecular docking analyses were performed against all the selected top ranked 12 peptides to evaluate the effective binding site (Tables S1 and 2).

Analyses of SARS-CoV-2 surface accessibility
The surface accessibility of the predicted peptides were observed >1.0 dipects that the selected peptides were located on the surface. The predicted peptides were observed on the basis of y-axis, and the most likely predicted peptides of SARS-CoV-2 were selected for surface probability (y-axis) and sequence position (x-axis) analyses. The highest score of the predicted peptides were observed 0.911, ranges from 301 to 306 amino acids with the sequence SGVTFQ, while the lowermost value was observed 0.508, rangs from 1 to 5 amino acids with the sequence SGFRK (S1 Fig).

PLOS ONE
Vaccine design and elucidation of novel compounds against 3C like protein of SARS-CoV-2

Surface flexibility of SARS-CoV-2 protein
The atomic vibrational motions in the protein structure determined by B-factor and temperature were calculated and analyzed by using Karplus and Schulz flexibility method. The stability and organization of the structure were determined by the B-factor values [23]. The B-factor values determine the quality of the model however a lower B-factor value suggests s reliable model and higher B-factor value depicts less organized and poorly ordered structures [24]. With the heptapeptide sequences of MESLVPG and SSDVLVN, the lowest and highest flexibility scores were observed as 0.999 and 1.0 respectively [24].

Prediction of SARS-CoV-2 Parker Hydrophilicity
Parker hydrophilicity scale method was used to evaluate the hydrophilicity of the predicted peptides by using reversed-phase HPLC on a C18 column to estimate the peptide retention durations. The association between antigenic sites and hydrophilic regions was calculated through immunological analyses. The hydrophilicity of each predicted peptide was observed through a hydrophilicity graph, with y-axis measuring hydrophilicity and x-axis showed the residual positions, to assess the hydrophilicity of SARS-CoV-2-predicted peptides (Fig 2). Parker's hydrophilicity forecast has a maximum hydrophilicity score.

PLOS ONE
Vaccine design and elucidation of novel compounds against 3C like protein of SARS-CoV-2

Antigenicity prediction for SARS-CoV-2 using Kolaskar and Tongaonkar
At locations 85 and 91 of the protein sequence 85 CVLKLKV 91 was determined 1.22 and showed highest antigenicity, while NGMNGRT had 0.84 as lowest antigenicity at positions 274 to 280. Table 2. Top-ranked selected discontinuous epitopes, interacting residues, and scores predicted discontinuous epitopes.

SARS-CoV-2 structure-based epitope prediction
Antigenicity, epitope prediction, accessibility, and flexibility in the 3D structure were also evaluated to overcome the errors [43]. The protein-antibody interactions were also investigated for all the selected epitopes and top ranked 3 SARS-CoV-2 conformational epitopes having >0.7 score were further evaluated. The proportion of atoms across the molecular substance and the antibody binding was determined for the target protein by using pI (isoelectric point value) score [44] and 5.95 score after titration was observed. The score range between 0.685 and 0.714 was observed along with names, lengths, and positions of the residues, as well as the scores of the three-member conformational epitope prediction panel.

Molecular docking analyses
Top ranked 25 CTL epitopes were identified and comparative molecular docking analyses were conducted. The binding affinity and global energy ranges from -23.45 to -32.5 kcal/mol and -27.73 to -67.09 kcal/mol respectively were observed against all the selected CTL epitopes (Table 3). Interestingly, it was observed that all the 25 CTL epitopes were docked at similar binding residues. Top ranked eight docked complexes were evaluated (Fig 3), and similar residues (TYR9, LEU5, ILE7, HIS41, MET49, ASN142, HIS164, ARG188, and GLN189) were observed.

Population coverage analyses
Epitopes linked with particular HLA alleles were evaluated to scrutinize MHC class I and MHC class II epitopes. MHC class I and MHC class II epitopes were found to be related to 58.73% and 18.06% of the world's population, respectively. MHC class I epitope coverage was observed higher in the Italian and Chinese populations as 100% (S1 File).

Multiple Sequence Alignment (MSA)
The genomes of seven coronaviruses were retrived for MSA alignment to elucidate the conserved region among all. It was observed that the binding domain has conserved region in all of the selected coronavirus strains (S2 and S3 Files).

Virtual screening and comparative docking
The selected peptides have a substantial anti-SARS-CoV-2 value as per followed methodology [4]. Virtual screening was performed against non structural protein of SARS-CoV-2 (PDB 6M2N) and FDA approaved library from ZINC database was utilized for virtual screening purpose. Comparative molecular docking anayses were performed against the complete FDA approaved library. All the docked complexes were evalued on the basis of least binding energy,   Table 3. https://doi.org/10.1371/journal.pone.0264700.g003

PLOS ONE
Vaccine design and elucidation of novel compounds against 3C like protein of SARS-CoV-2 The selected protein plays a significant role in the replication of SARS-CoV-2. The scrutinized compounds have the ability to inhibit the target protein based on extensive in silico analyses. The scrutinized compounds followed the lipinky's rule of five and better oral bioavailability. The scrutinized compounds showed the solibility in water at 25˚C. The binding sites and maximum binding affinity of all selected compounds have prmosing results (S2 and S3 Tables).
There is an urgent need of an effective cure for coronaviruses. SARS-CoV-2 pandemic became a medical emergency in all over the globe [45]. Vaccine development is of significant interest to peptide inhibitors [46]. The peptide targets include lower toxicity, lower side effects and faster action than traditional medicinal products based on ligands. Immunoinformatics methodologies help scientists to minimise the laboratory load, less expedient and cost-effective [47]. There have been major advances in in silico drug design over the last decade [48][49][50][51]. A large number of biological difficulties were tackled by the use of different bioinformatics approaches [52][53][54]. The epitopes of non structural protein (PDB 6M2N) were designed and CTL epitopes were also predicted against SARS-CoV-2. The binding affinities for the predicted peptides for MHC-I were further evaluated through comparative molecular docking analyses. Eight peptides showed effective MHC-I (HLA-B) interactions. Based on the global energy value, twelve peptides have been selected with the greatest antigenicity and binding affinities (S2 Table).

Conclusion
The goal of current effort was to elucidate the efficient peptide based inhibitors against SARS--CoV-2 non-structural protein. The predicted epitopes were designed followed by comparative molecular docking studies against MHC-I. Moreover, the interactional studies of the scrutinized docked complexes were analyzed. In conclusion, 12 epitopes (GTDLEGNFY,
Supporting information S1 Fig. The probability of a residue surface influenced by accessibility to surfaces. (DOCX) S1 Table. Predicted CTL epitopes and predicted amino acid residues from the SARS-CoV-2, Table 2. Top-ranked selected discontinuous epitopes, interacting residues, and scores predicted discontinuous epitopes.